Read in the previously processed/cleaned zip code data in Stata’s format, and begin to assign exposure peiod variables by race.
foo <- read.dta("~/Wildfires_kurt/race eth of asthma ED visits by zip day_exp merged.dta") %>% mutate(wildfire = ifelse(pm_25 > 0, "WFperiod", "nonWFperiod")) %>%
rename(zip = bene_addr_zip)
First we can look at the concentrations in each zip code and day.
Next we’ll classify each day-zip combination as experiencing some wildfire exposure or not. Wildfire period is defined as any day where the zip code has a PM2.5 value greater than zero (note: some are very low).
Now we can begin to calculate RR, first for the entire zip codes (not race stratified). We’ll use a constant of o.5 to fill in spots where there are no cases during the exposure or control periods.
constant <- 0.5
rates_overall <- foo %>%
mutate(total = white + hispanic + black + native_am + asian_pi + other + missing) %>%
group_by(zip, wildfire) %>%
summarise(cases = sum(total),
days = length(svc_from_dt)) %>%
mutate(casesPerDay = cases/days)
RR_overall <- foo %>%
mutate(total = white + hispanic + black + native_am + asian_pi + other + missing) %>%
group_by(zip, wildfire) %>%
summarise(cases = sum(total),
days = length(svc_from_dt)) %>%
mutate(cases = ifelse(cases == 0, constant, cases),
casesPerDay = cases/days) %>%
select(zip, wildfire, casesPerDay) %>%
spread(key = wildfire, value = casesPerDay) %>%
mutate(RateRatio = WFperiod/nonWFperiod,
ln_RateRatio = log(RateRatio)) %>%
rename(WF_rate = WFperiod,
nonWF_rate = nonWFperiod)
DT::datatable({rates_overall}) %>% DT::formatRound(5,5)
DT::datatable({RR_overall}) %>% DT::formatRound(c(2:5),3)
Plot Overall rate ratios
## Calculating race-specific RRNow we can begin to calculate race-specific RR.
RR_race <- foo %>%
group_by(zip, wildfire) %>%
summarise(white = sum(white),
hispanic = sum(hispanic),
black = sum(black),
native_am = sum(native_am),
asian_pi = sum(asian_pi),
other = sum(other),
days = length(svc_from_dt)) %>%
gather(white, hispanic, black, native_am, asian_pi,other, key = race, value = cases) %>%
mutate(cases = ifelse(cases == 0, constant, cases),
casesPerDay = cases/days) %>%
select(zip, wildfire, casesPerDay, race) %>%
spread(key = wildfire, value = casesPerDay) %>%
mutate(RateRatio = WFperiod/nonWFperiod,
ln_RateRatio = log(WFperiod/nonWFperiod))
rates_overall <- foo %>%
mutate(total = white + hispanic + black + native_am + asian_pi + other + missing) %>%
group_by(zip, wildfire) %>%
summarise(cases = sum(total),
days = length(svc_from_dt)) %>%
mutate(casesPerDay = cases/days)
DT::datatable({RR_race}) %>% DT::formatRound(c(3:6),3)
Look at the distrubutions of race_specific rate ratios, need to look into why so many of these are focused around 1.5…?